[wasm2js] Escape module names in generated JavaScript output#8377
[wasm2js] Escape module names in generated JavaScript output#8377sumleo wants to merge 1 commit intoWebAssembly:mainfrom
Conversation
WebAssembly module and base names are arbitrary UTF-8 strings per the spec, and may contain JavaScript string metacharacters such as quotes, backslashes, and newlines. The wasm2js code generator was inserting these names directly into JavaScript string literals without escaping, which could produce syntactically invalid or exploitable JavaScript output. Add an escapeJSString() helper that escapes backslashes, single and double quotes, newlines, carriage returns, and the Unicode line/paragraph separators (U+2028, U+2029). Apply it at all three injection points: 1. emitPreES6: ES6 import 'from' clause (single-quote context) 2. emitPostES6: object literal keys (double-quote context) for imported functions, memories, and tables 3. initActiveSegments: imports['module']['base'] subscript access (single-quote context) for both module and base names
|
I'm not opposed to this if we have a user that would benefit. But AFAIK this has not come up because wasm2js is used on simple modules from LLVM, where this complexity is not needed. |
| // Escape a string for safe inclusion in a JavaScript string literal. | ||
| // WebAssembly module/base names are arbitrary UTF-8 and may contain characters | ||
| // that are JS string metacharacters (quotes, backslashes, newlines, etc.). | ||
| static std::string escapeJSString(std::string_view str) { |
There was a problem hiding this comment.
Could we use printEscapedJSON from support/string.h instead? If not, that would be a better place to add new escaping logic.
|
Fair point - the primary motivation was hardening against unexpected input, not a specific user issue. I understand if this is not a priority given wasm2js usage patterns. Regarding the inline comment: |
|
I would be happy to accept the PR with the scaping moved to string.h/string.cpp. I can imagine that this will be useful again in the future. |
|
Sounds ok to me with that refactoring, plus a test. |
Summary
WebAssembly module and base names are arbitrary UTF-8 strings per the spec, and may contain JavaScript string metacharacters such as quotes, backslashes, and newlines. The wasm2js code generator was inserting these names directly into JavaScript string literals without escaping, which could produce syntactically invalid or exploitable JavaScript output.
This PR adds an
escapeJSString()helper that escapes backslashes, single/double quotes, newlines, carriage returns, and the Unicode line/paragraph separators (U+2028, U+2029), and applies it at all three affected output points insrc/wasm2js.h:emitPreES6— ES6import ... from '...'clause (single-quote string context). A module name containing'would break out of the string literal.emitPostES6— Object literal keys written as"..."for imported functions, memories, and tables. A module name containing"would break out.initActiveSegments—imports['module']['base']subscript expressions. Both module and base names are interpolated into single-quote strings without escaping.Test plan
python3 check.py --binaryen-bin ./build/bin wasm2js)import->moduleandimport->baseinwasm2js.h